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ABSTRACT 

To date no theoretical results have been developed to pre¬ 
dict the performance of the proportionate normalized least 
mean square (PNLMS) algorithm or any of its cousin algo¬ 
rithms such as the /i-law PNLMS (MPNLMS), and the e- 
law PNLMS (EPNLMS). In this paper we develop an ana¬ 
lytic approach to predicting the performance of the simplified 
PNLMS algorithm which is closely related to the PNLMS al¬ 
gorithm. In particular we demonstrate the ability to predict 
the Mean Square Output Error of the simplified PNLMS al¬ 
gorithm using our theory. 

Index Terms — Adaptive filtering, convergence, propor¬ 
tionate-type normalized least mean square (PtNLMS) algo¬ 
rithm, sparse impulse response. 

1. INTRODUCTION 

We begin by assuming there is some input signal denoted 
as x{k) for time k that excites an unknown system with im¬ 
pulse response Wopt- Let the output of the system be y{k) = 
w^^jx(fc) wherex(fc) = [x{k),x{k—l),... ,x{k —L+1)]'^. 
The measured output of the system, d{k), contains zero-mean 
stationary measurement noise v{k) and is equal to the sum 
of y{k) and v{k). The impulse response of the system is es¬ 
timated with the adaptive filter coefficient vector, w{k). The 
error signal e{k) between the output of the adaptive filter y{k) 
and d{k) drives the adaptive algorithm. The weight devia¬ 
tion (WD) vector is given by z(fc) = Wopt — w{k). The 
normalized least mean square (NLMS) algorithm for an arbi¬ 
trary time-varying stepsize control matrix is shown in Table 
1, as given in [1]. Here, [3 is the fixed stepsize parameter, 
G{k -f 1) = diag {gi{k + 1), ■ ■ ■, gL{k + 1)} is the time- 
varying stepsize control matrix, and L is the length of the 
adaptive filter. The constant 6 is typically a small positive 
number used to avoid overflowing. 

Next, we seek the representation of the Mean Square Out¬ 
put Error (MSE) (Learning Curve) for the proportionate-type 
normalized least mean square (PtNLMS) algorithm [2]. The 
MSE is given by J{k) = E{\e{k)\'^}. By expanding the 
e{k) term and assuming that the input signal is white, i.e. 


Table 1 

NLMS Algorithm with Arbitrary Stepsize Matrix 


x(fc) 

= [x{k)x{k — 1)... 

x{k — L + 1)] 

y{k) 

= x.'^{k)w{k) 


e(k) 

1 

II 


G{k+1) 

= diag{ 5 i(A:-f 1),. 

..,gL{k+ 1)} 

w(fc + 1) 

- WIM 1 3G{k+lMk)e{k) 

xT(k)G(k+l)x(k)+5 


R = (T^I, and (3 is so small that the LMS coefficient esti¬ 
mator acts as a low pass filter, then we can rewrite the MSE 
in the following form [4]: 

L 

J{k) = Jmin + al'^E{zl{k)} 

i=l 

where the first term Jmin is equal to the variance of the noise, 
al, and Zi{k) are the elements of z(fe). Hence in order to 
calculate the MSE we need to find the expected value of the 
square weight deviations zf{k). 

At this stage we proceed by considering the MSE for spe¬ 
cific proportionate type NLMS algorithms. Many proportion¬ 
ate type NLMS algorithms, such as the PNLMS [3], MPNLMS 
[1], and EPNLMS [2] imply highly non-linear (threshold- 
based) operations. In order to simplify the derivation of ana¬ 
lytical results we examine in this paper a simplified PNLMS 
algorithm. The calculation of the gain for the simplified PN¬ 
LMS algorithm is given in Table 2. The simplified PNLMS 
algorithm avoids the usage of the maximum function which 
is employed in the PNLMS, MPNLMS, and EPNLMS algo¬ 
rithms. 
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2. RECURSIVE CALCULATION OE THE MEAN WD 
AND MEAN SQUARE WD 

We can represent the WD at time fc + 1 in terms of the prior 
WD at time k using the recursion for the estimated optimal 
coefficient vector. Using the convention that Xi{k) = x{k — 
i + 1), this recursion in component-wise form is given hy 

Zi{k -f 1) = Zi{k) 

f]gi{k + l)xi{k) Xj{k)zj{k) 
x^(fc)G(fc + l)x(fc) + 5 
I3gi{k + l)xi{k)v{k) 
y.T{k)Q{k + l)y.{k) + 5' ^ 

The component-wise form of the recursion for the square 
of the WD is given hy 

z'f{k -f 1) = z1{k) 

2l3gi(k+l)xi{k)Y.\^iXj(k)zj{k)zi{k) 

(fc)G(A;+l)x(fc)+(5 

2(3gi {k)y{k)zi{k) 

x^(fc)G(fc+l)x(fc)+5 

P'^g‘i{k+l)xl{k) Y.j 'Em Xj(k)Xm{k)Zj(k)Zm(k) (2) 

(xT’(fc)G(fc-|-l)x(fc)-|-<5)2 

, P'^g1{k+l)x1{k)v^{k) 

■*" (x^(fc)G(fc-|-l)x(fc)-|-5)2 

/3^9?(fe+l)a:?(fc)Ej Xj{k)zj(k)v(k) 
(x^(fc)G(fc-|-l)x(fc)-|-5)2 

Next we take the expected value of the WD and the square 
WD. In order to do so we make the following set of assump¬ 
tions. 

Assumption I: The adaptation stepsize parameter (3 is suf¬ 
ficiently small and the LMS coefficient estimator acts as a low 
pass filter. Hence, Zi{k) changes slowly relative to Xi{k). 

Assumption II: The input signal and observation noise are 
uncorrelated. This assumption is justified provided that the 
use of the linear unknown system model is applicable and the 
length of the Wiener optimal solution for the adaptive filter is 
exactly equal to the order of the unknown system. 

Assumption III: The expectation of a ratio of two random 
variables is equal to the ratio of the expectations of each ran¬ 
dom variable. In our case the denominator of interest is typ¬ 
ically the term x(fc)^G(fc + l)x(fc) + S. This assumption 
holds if the denominator is nearly constant or if we have the 

condition that L » 2 + 1)}’ [5]. We can 

derive the expectation of the denominator term by looking at 
it in component-wise form and applying Assumption I, [5]; 

L 

+ 3) + S} 

f=i 

L 

= E{Y^ E{x^j{k)}gj{k +l) + 6} = alL + 6 (3) 

i=i 


Simulations have confirmed that this assumption holds in the 
situations discussed in this paper. Also, when p is very small 
i p < 10“^ ) the experiments show that the assumption does 
not hold. However most real world applications use larger 
values for the p parameter and therefore this is not an issue. 

Assumption IV: The expectation of the denominator term 
squared is equal to the square of the expectation of the de¬ 
nominator. This assumption leads to 

i?{(x^(k)G(fc -f l)x(fc) -I- (5)^} = (cr^L -I- (5)^. 

It holds if the denominator is nearly constant. 

Therefore we can write that the expectation of the WD 
can be found recursively from the prior time step by 

E{zi{k + 1)} = E{zi{k)} - (3oE{g,{k + l)zi{k)} (4) 

where/3o = 

Similarly based upon our assumptions, the expected value 
of the square WD is given by 

E{zf{k + 1)} = E{zf{k)} - 2l3oE{gi{k + l)zf{k)} 

+[3‘lE{g‘l{k -f 1) ^ E{g1{k + 1)}. (5) 

At this point we have the potential to recursively estimate 
the expected value of the WD and the square WD vectors. 
One issue remaining is the calculation of terms such as 

E{g^{k+l)zf{k)} (6) 

for n € {1, 2}, m G {0,1,2} and z, j G {1,2,..., L}. 

We assume, that 

E{g^{k + l)zfm = E{gi{k + (fc)} if i j. 

Now, we can take two approaches when calculating the 
expectation for i = j. In the first approach we assume that 
the expectation of the product of g'^{k + 1) and zY^{k) is sep¬ 
arable. In addition to this, we assume that the expectation of 
the product of the gains is equal to the product of the expec¬ 
tations of the gains (this assumption holds when gi{k + 1) is 
slow varying), that is 

E{gnk+l)} = E{g,{k+l)r. (7) 

Therefore we have 

E{gnk+i)zrm = Eig^ik+i)rE{zrm- 

This approach has been dubbed the ‘Separable Approach’. 

Alternatively, we can calculate explicitly the expectations 
in (6). We refer to this approach as the ‘Non-Separable Ap¬ 
proach’ . In the next section we develop the needed probability 
distributions and expressions for the two approaches. 


3826 



3. RECURSIVE CALCULATION OE EXPECTATIONS 

We begin by assuming that the component of the weight 
deviation at time k has a normal distribution with mean 
and variance af{k) i.e. 

Zi{k) ^ M{fj,i{k),af{k)). 

This assumption is based on a possibility of applying the cen¬ 
tral limit theorem to the recursion for the weight deviation in 
(1), as well as simulations. Given this assumption each com¬ 
ponent of the estimated optimal weight vector is distributed 
as 

Wi{k) = Wi- Zi{k) Af{'mi{k),(T'f{k)) 
where mi{k) = Wi — Hi{k). The p.d.f. of |wi(A:)| is given by 


f{\wi{k)\) = 


1 


2a'-‘(k) 


\/2TTa‘f{k) 

P{wi{k)) (8) 

where U{x) is the unit step function [6]. 

We now take advantage of the form of this p.d.f. and 
calculate several expectations which will be useful in future 
derivations. We begin by finding the mean of this distribution 
which is given by 


E{\wi{k)\} = mi(fc)erf( 


mi{k) 

\/2ofp0' 




) + \l-ai{k)e 

(9) 


Additionally, the second moment is given by 

E{\wi{k)\^} =m‘^{k) + ai{k). (10) 

We can also calculate the following expectations: 
E{\wi{k)\{wi - Wi{k))} = (wipk) - af{k) - fifik)) 


xerf 


\ rr^f(k) 

'iik) J _|_ 2cri(k)tj.i{k) ^ 2a'f(k) 


\/2ofW J 


( 11 ) 


E{\wi{k)^Wi - Wi(fc))^} = {winf{k) +Wiaf{k) 

-3pk)ank) - ^f(fc))erf(^-^) ^^2) 


• (fc) 

-(2MKfc)+4af(fc))!^e“^ 


E{zUk + 1)} = E{zUk)} - 2PE{gi{k + l)}E{zf{k)} 
+f3^M9i{k + 1 )}" E E{z^ik)} + ^E{gp + l)}^ 

j=i 

(15) 


respectively. Note crf{k) = E{zf{k)} — E'^{zi{k)}. At this 
point we are left to find E{gi{k+1)}. This term can be found 
as 


E{gP+l)} = E{ 


Ei{k) 


l/Lj:,Fj{k)'^ 
p + E{\wi(k)\} 
^/LY.i{P + E{\wj{k)\})' 


(16) 


This algorithm is initialized by setting E{zi{0)} = Wi and 
E{zf{0)} = wf. 

3.2. Non-Separable Expectation Calculations 

In order to calculate the mean WD and the mean square WD 
we find: 




pE{zi{k)}+E{\wi{k)\{wi—Wi{k))} 

{P+E{\wj{k)\}) 




pE{zf(k)}+E{\wi(k)\{wi-Wi(k)y} 


(17) 


(18) 


l/'C'E,- (p-l--E{|™j(fc)|}) 


E{gKk+l)z!{k)} = E{- 


(p+Ni-Zi(fc)|) z^(k) 


p'^E{zf{k)} + 2pE{\wi{k)\{wi - Wi{k))'^} 

+E{\wi{k)\'^{wi - Wi{k))'^} /(iEj {P +E{\wj{k)\})y. 

(19) 

£«(Z-+ D) = E{ , (<■*■-«')’ ) 

(iZ-f-Ej (p+l™j-2j(fc)l)j (20) 

_ p'^+2pE{\mik)\}+E{\wyk)p 

(l/-C'E,(p-ti3{|*.(fe)|}))" 

Using equations (9)-(13) these terms can be calculated. 

4. RESULTS 


E{\wi{k)\‘^{wi - Wi{k)f} = wf{p^{k) + af{k)) 
-2wi{p,^{k)+ ^Pi{k)af{k)) (13) 

+Ptik) + Qp^i{k)af{k) + ^af{k) 

3.1. Separable Expectation Calculations 

In the separable case the expectation of the WD and the square 
WD are given by 

E{zi{k + 1)} = E{zi{k)}-poE{gi{k + l)}E{zi{k)}(l4) 


Now we compare the theory derived to actual results from 
Monte Carlo simulations. In the simulations and figures that 
are shown the following parameters have been chosen unless 
specified otherwise, L = 512, cr^ = 10“^ cr^ = 10“®, and 
6 = 10“"^. We have developed a metric to quantitatively mea¬ 
sure how well the theory fits the ensemble averaged results. 
The metric is given by 

^ _ Sfc l^rik) — eMc(^)l 
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where e\{k) is the squared output error generated by the the¬ 
ory at time k and is the squared output error gen¬ 

erated by the ensemble average at time k. The term in the 
denominator has been added in an attempt to make the metric 
independent of the input signal power. 

We compare the performance of the ‘Separable Approach’ 
theory versus the ‘Nonseparable Approach’ theory when us¬ 
ing the echo-path impulse response presented in [7]. This im¬ 
pulse is sparse because very few coefficients have non-zero 
values. The performance of the ‘Separable Approach’ theory 
for p = 10“^ is shown in Figure 1. The results when using 
the ‘Nonseparable Approach’ theory for p = 10“^ is shown 
in Figure 2. The ‘Nonseparable Approach’ theory performs 
slightly better than the ‘Separable Approach’ theory. This 
improvement is reflected in the metric C where it has been 
reduced from a value of 0.14631 to 0.11011 after applying 
the ‘Nonseparable Approach’ theory. 


ENSEMBLE-AVERAGED SE VS. THEORY MSE AS A FUNCTION OF TIME 



Fig. 1. Learning curve of simplified PNLMS algorithm p = 
10“^ using ‘Separable Approach’ theory 

5. CONCLUSIONS 

We have developed two analytical methods to predict the per¬ 
formance of the simplified PNLMS algorithm by developing 
recursions for the mean weight deviation and mean square 
weight deviation. The weight deviation is assumed to have 
a Gaussian distribution. In the first method the expectation 
of the product of the gain and weight deviation is considered 
to be separable. In the second method the expectation of the 
product of the gain and weight deviation is derived without as¬ 
suming the separability. The second method while more com¬ 
putationally intensive offers some improvement in the ability 
to predict the performance of the simplified PNLMS algo¬ 
rithm. Further analysis shows that the improvement comes 
mainly from the direct calculation of the E{gf[k)} instead of 
the assumption in (7). 


ENSEMBLE-AVERAGED SE VS. THEORY MSE 



Fig. 2. Learning curve of simplified PNLMS algorithm p = 
10“^ using ‘Nonseparable Approach’ theory 
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